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Abstract — The CALICE collaboration is developing calorime- 
ters for a future linear collider, and has collected a large amount 
of physics data during test beam efforts. For the analysis of 
these data, standard software available for linear collider detector 
studies is applied. This software provides reconstruction of raw 
data, simulation, digitization and data management, which is 
based on grid tools. The data format for analysis is compatible 
with the general linear collider software. Moreover, existing 
frameworks such as Marlin are employed for the CALICE 
software needs. The structure and features of the software frame- 
work are reported here as well as results from the application 
of this software to test beam data. 

I. Introduction 

THE CALICE collaboration (T) is formed of more than 
300 physicists and engineers from Europe, America, Asia 
and Africa. Its purpose is to carry on a research and devel- 
opment program of hadronic and electromagnetic calorimeters 
for a future Linear Collider at the TeV scale. 

Several test beam campaigns (see Tab. [I]) were successfully 
performed by the collaboration with different combinations 
of detectors: a hadronic calorimeter (HCAL) using steel or 
tungsten as absorber and scintillator tiles, read out by sili- 
con photomultipliers (SiPMs), as active material Q, a Si-W 
ECAL [3], a Tail Catcher and a digital HCAL based on gas 
proportional chambers (RPCs) (4). 

The detectors were accompanied by DAQ systems, triggers 
and drift chambers, which provide tracking information. The 
test beam campaigns resulted in several tens of terabytes 
of data saved on tape. For lasting physics results, it has 
to be insured that the data are treated consistently, despite 
the different types of detectors, and their different states of 
development. 

In the following, the efforts done by the CALICE collab- 
oration in developing the software needed to extract relevant 
physics results are presented. 

II. The CALICE Data Flow 

At the experimental site, the data are recorded in binary 
format and then transferred directly to the grid storage el- 
ements at DESY Hamburg, which provides a tape back-up, 
and replicated, for safety reasons, at the Computing Centre of 
IN2P3 at Lyon. The event building is also done on the grid 
and generates files in the LCIO format (see section [HI]). Since 
the year 2005, a total of about 50 TByte of raw and processed 
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TABLE I 

Test beams carried on by the CALICE collaboration 



Location 


Year 


Detectors 


DESY 


2006 


Scintillator/steel HCAL 


CERN 


2007 


Si-W ECAL, 
scintillator/steel HCAL, 
Tail Catcher 


FNAL 


2007/2008 


Si-W ECAL, 
scintillator/steel HCAL, 
Tail Catcher 


FNAL 


2009 


Scintillator ECAL, 
scintillator/steel HCAL, 
Tail Catcher 


CERN 


2010 


Scintillator/tungsten HCAL 


FNAL 


2010 


Digital HCAL 



data and simulation files have been managed using grid, i.e 
leg tools 0. 

In order to be analysed, the data need to be calibrated 
first. The calibration depends on the detector type and usually 
implies the usage of so-called calibration constants, which are 
extracted offline. 

For a common storage of all calibration constants, the map- 
pings of the different channels and alignment information, a 
MySQL data base, hosted at DESY, is employed by CALICE. 
Some of the information is written into the data base during 
conversion and some, like calibration constants, at a later stage 
by experts. Once the best available information is written to the 
data base, the corresponding folders are tagged. This ensures 
that the results can be reproduced and cross-checked later 
without problems. 

The access to the data base is done based on IP-ranges. As 
soon as a new group joins the collaboration, their IP-ranges 
are added to the list. Even if the access to the data base may 
be sometimes slow from remote locations (like Japan), there 
is always the possibility of dumping the data base information 
on files which can be stored locally. 

Actually, there are two instances of the data base: one for 
reading, available for everybody, and one for writing and 
reading, for experts work. The experts need to provide a 
password in order to be able to modify the contents of the 
data base. For test purposes, user folders are created. Once the 
expert is satisfied with the results, the information is copied 
to the central folders. 

A schematic representation of the CALICE data flow is 
shown in Fig. [T] 

The mass reconstruction is done centrally, upon request. 
This is based on jobs submitted to the grid (involving centres 
from all over Europe), as well as local batch farms. Once the 
data is calibrated (i.e. reconstructed), it can be analysed. 
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Fig. 1. Schematic description of the CALICE data flow. 



III. The Structure of the CALICE Software 

The CALICE software is based on the ILC software (6). It 
uses the C++ programming language, and the cmake Q tool 
for creating platform independent Makefiles. The documenta- 
tion is done mainly inside the code, using doxy gen (8). 

The development is done by people from the various groups. 
There is a designated responsible for each group, and they are 
coordinated by a central software person. 

The software is maintained with an SVN server hosted by 
DESY O, and is organised in packages: 

• calicejuserlib Contains general purpose classes, used in 
the other packages 

• calice_reco The main package, contains the reconstruc- 
tion code for the scintillator HCAL, Si-W ECAL and for 
the Tail Catcher 

• calicejcioconv Does the conversion from binary to LCIO 
format 

• calice_sim Includes digitisation of simulated events 

• calice_run Contains bash scripts for automatic generation 
of steering files, used for reconstruction, noise extraction 
and for the digitisation 

• calicejorso Contains HelloWorldProcessor, as start- 
up for new users 

In addition, an external package, called RootTreeWriter, is 
used to create ROOT trees for simple analyses. 

The software releases are announced on the main web 
page [10] and on dedicated mailing lists. With every release, 
a tar-ball is created and installed on the grid, for subsequent 
usage. 

The simulation is realised by Mokka fffl . which provides 
the geometry interface to the GEANT4 lfT2l simulation toolkit 
and is also used for full detector simulation studies. In order 
to save time, the digitisation step, performed in the calice_sim 
package is separated from the time consuming simulation step. 
The simulation and digitisation of the data runs are done 
centrally, upon request. 



A. CALICE Event Displays 

For displaying CALICE test beam events, currently two 
displays can be used: one based on CED, which is the standard 
ILC event display, used also for the full detector, and one based 
on ROOT geometry classes, named DRUID fT3ll . An example 
of such an event display is shown in Fig. [2] 





Fig. 2. Example of a CED based event display of a pion shower of 20 GeV 
in the CALICE HCAL and Tail Catcher. 



B. Testing 

Before each release, it is tested if the software compiles, and 
if it produces the expected results. Nevertheless, since mistakes 
can always happen, a better solution is to do the testing in 
an automatised way, and to be able to easily compare results 
obtained with previous software tags. This is done for the 
CALICE software using ctest lfT4l . which is a tool coming 
for free with cmake. This tool can be used for automatic 
updating from SVN, for configuring, building, testing, memory 
checking and for submitting results to a CDash dashboard 
system. CALICE uses the CDash server installed by the ILC 
software group at DESY Hamburg. Apart from automatisation, 
this has the advantage that the outcome of the tests is stored in 
a central place, and the view of the history in time is possible. 

First basic ctest scripts are already being used for 
several CALICE packages {calicejuserlib, calicejreco and 
calice_sim). 

IV. Application of the CALICE Software - 
PandoraPFA 

In the beginning years of the collaboration, the software 
was written in view of the immediate needs of the specific 
group. As the groups evolved, more and more accent was put 
on modularity and flexibility, since new coming detectors need 
to be integrated easily. In addition, CALICE profits from the 
close collaboration with the ILC core software developers. 
The advantages of this integrated strategy are underlined 
by a recent analysis in which data recorded in test beam 
were subject to an analysis using tools developed for the 
full detector studies fT5lL namely the Pandora Particle Flow 
Algorithm (PFA) lfl6l . 

The aim of building a very high granular calorimeter is 
the capability to measure the details of hadron showers and 
ultimately recover neutral hadron energies in the vicinity of 
charged hadrons. This leads to an increased overall jet energy 
resolution (a E /E ~ 30% ^(E (GeV))), since the energy of 
charged particles can be measured in the tracking detectors 
with much higher resolution than in the calorimeters. The 
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Fig. 3. Difference between the recovered energy and the measured energy for 
the 10 GeV neutral hadron at 5 cm distance from the 10 GeV charged hadron. 
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Fig. 4. Difference between the recovered energy and the measured energy 
for the 10 GeV neutral hadron at 30 cm distance from the 10 GeV charged 
hadron. 



resolution is degraded by the false assignment of hits to over- 
lapping particle showers from charged and neutral particles. 
This depends on the deposited energy in the calorimeter as 
well as on the distance of the showers. 

To study these effects test beam data samples with charged 
pions with energies from 10 to 50 GeV have been used, which 
were taken during the CERN test beam runs in 2007. The aim 
is to study the dependence of the energy recovery capability 
of the PFA in events containing two pions of the pion energies 
and their distance. The events containing two pions have been 
constructed by overlaying two single pion events, of which the 
energies have been measured with the standard procedure in 
the calorimeter, and varying their energy and distance of the 
shower axes. Additionally it has been assumed that one of the 
pions is neutral and it has been studied how well the energy 
of this pion is recovered by the PFA, after having mapped the 
event topology to a Linear Collider detector geometry. 

Fig. [3] shows the difference between the recovered energy 
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Fig. 5. Mean difference between the recovered energy and the measured 
energy for the 10 GeV neutral hadron as a function of the distance to the 
10 GeV (solid) and 30 GeV (dashed) charged hadron. 



and the previously measured energy of a 10 GeV neutral pion 
in the vicinity of a 10 GeV charged pion with a distance of 
5 cm. The distribution is very broad and degrades even further 
for higher charged pion energies. As a comparison Fig. [4] 
shows the difference between the recovered energy and the 
previously measured energy for pions of the same energy, 
but with a distance of 30 cm. The width of the distribution is 
considerably smaller, which shows that the expected behaviour 
is reproduced by the algorithm. In both plots not only the test 
beam data distribution is shown, but also the MC predictions 
of the LHEP model and the QGSB_BERT model. Apparently, 
the simulation reproduces the results of the test beam data 
very well. A summary plot showing the mean of the difference 
between the recovered energy and the measured energy as a 
function of the distance of the shower axes for two different 
charged pion energies is given in Fig. [5] As discussed, the 
confusion depends on the radial distance between the showers, 
i.e. the showers overlap more at smaller distances. Again, 
good agreement between data and MC predictions is visible, 
which shows that the PandoraPFA is a reliable reconstruction 
program for a full size detecctor. 

V. Conclusions and Outlook 

The CALICE collaboration has operated several test beam 
campaigns over the last years and the analysis of the data 
requires powerful software tools. This incorporates the data 
reconstruction and analysis as well as the data management. 
The CALICE software makes use of the developments done 
for the ILC analysis software and is fully integrated into 
this framework. In particular Marlin processors are used for 
the analysis and Mokka as the simulation framework. The 
worldwide computing grid is heavily used both for data storage 



and data processing. The C ALICE software has been shown to 
scale for large data sets during years of test beam data analysis. 
A study of the PandoraPFA algorithm has been presented 
as an example of the successful application of the CALICE 
software in data analysis and it shows that it provides a reliable 
reconstruction for a full size experiment. 

The next step for the development of the CALICE software 
is the integration of the technological prototypes as well as 
the Digital HCAL (DHCAL) and the Semi-Digital HCAL 
(SDHCAL). Integration means for example the usage of the 
LCIO data format and the common CALICE data base. This 
is necessary, since the DHCAL started taking test beam data 
recently and the SDHCAL will start data taking in 2011. 
Furthermore, the second generation of the CALICE DAQ is 
currently under development and has to be fully integrated into 
the CALICE software. 
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